Phase diagram of epidemic spreading - unimodal vs. bimodal probability distributions 
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The disease spreading on complex networks is studied in SIR model. Simulations on empirical 
complex networks reveal two specific regimes of disease spreading: local containment and epidemic 
outbreak. The variables measuring the extent of disease spreading are in general characterized 
by a bimodal probability distribution. Phase diagrams of disease spreading for empirical complex 
networks are introduced. A theoretical model of disease spreading on m-ary tree is investigated 
both analytically and in simulations. It is shown that the model reproduces qualitative features of 
phase diagrams of disease spreading observed in empirical complex networks. The role of tree-like 
structure of complex networks in disease spreading is discussed. 
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I. INTRODUCTION 

The spreading of an epidemic on complex networks 
has been a subject of intensive research during the last 
decade [H, 0, 0] . The importance of this line of research 
is evident both in disease control and prevention and the 
spreading of all forms of malicious software in computer 
and communication networks. A crucial element of a suc- 
cessful epidemic model is a good description of epidemic 
spreading pathways. Complex networks were found to 
be a good description of social contact networks 0, [||, 
whereas the physical structure of IT and communication 
networks directly qualifies them as complex network sys- 
tems [H, 0] ■ The "paradigmatic" characteristics of com- 
plex networks such as "small world network" property 
Q and "scale-free network" property 0, [l(| profoundly 
influence the patterns of the epidemic spreading. The 
roles of intercontinental air travel and the existence of 
highly connected hubs in the onset and fast spreading of 
a pandemic are distinctly important (ill . Il2| . 

Epidemiological models, such as the SIR model used 
in this paper [lJI, describe the stochastic process of dis- 
ease spreading along the complex network pathways. The 
model parameters are p, the probability per time step of 
a node to get infected if its neighboring node is also in- 
fected and q, the probability per time step of an infected 
node to recover. A prominent question of the epidemio- 
logical model dynamics is the existence of thresholds for 
the onset of the epidemic. In a model of homogeneous 
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mixing based on the mass action principle a condition 
for the onset of the epidemic is formulated in terms of 
the basic reproduction ratio Rq, as Rq > 1. On the other 
hand, studies of the disease spreading on scale-free com- 
plex networks in SIS model showed the absence of the 
epidemic threshold [141 ]. In this paper, a reasonable no- 
tion that the structure of the disease spreading pathways 
influences the conditions for the onset of epidemic is elab- 
orated and quantified. From the very stochastic nature 
of the SIR model it is easy to single out two specific pat- 
terns/regimes of the disease spreading: the disease may 
"die-out" after a (small) number of steps or the epidemic 
may spread throughout the entire network. However, in 
empirical complex networks the transition between these 
two regimes is not abrupt. There exists a considerable 
segment of parametric space in which the said regimes 
coexist, i.e. there is a non-negligible probability for the 
appearance of both of them. In this paper we study the 
conditions for the appearance of these regimes, exam- 
ine the possibility of their coexistence and construct the 
phase diagram of epidemic spreading. Furthermore, we 
propose a simple theoretical model which qualitatively 
reproduces the observed phase diagrams. 



II. PHASE DIAGRAM OF EPIDEMIC 
SPREADING 

The simulations of disease spreading in the SIR model 
on empirical networks reveal an interesting interplay be- 
tween the network structure and the SIR model parame- 
ters. The variables used to measure the extent of disease 
spreading are the number of infected nodes and the epi- 
demic range. The number of infected nodes is defined 
as the number of nodes that got infected at any moment 
during the spreading of the disease. The epidemic range 
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is defined as a maximal number of steps that the infec- 
tion travelled from the initially infected node. We study 
the disease spreading on a number of empirical complex 
networks and in particular we present detailed results of 
simulations for the complex network of 2003 Condensed 
matter collaborations introduced in 

Let us start with an interesting observed feature which 
is central for the purpose of this paper: the probability 
distribution for the number of infected nodes is bimodal 
for some values of model parameters p and q, as presented 
in Fig. [T]and Fig. [2] This interesting feature was already 
studied and reported for the class of simulated scale-free 
and small world networks in [l5| • We further observe the 
bimodal probability distribution for the epidemic range 
as shown in Fig[3] This coincidence in character of proba- 
bility distributions strongly suggests that the bimodality 
is an important feature of the disease spreading in SIR 
model on the said complex network. Our goal in this pa- 
per is to find out to which extent the observed behavior 
is generic, i.e. at least qualitatively the same for different 
SIR model parameters, initially infected nodes and even 
for different choices of complex networks. 

A crucial question is whether, for the chosen complex 
network and the initially infected node, we have to fine- 
tune parameters p and q to produce the bimodal proba- 
bility distributions of the variables measuring the extent 
of disease spreading. To answer this question, we intro- 
duce a study of the entire (p, q) parametric space of the 
SIR model: a [0, 1] x [0, 1] square. For each set of (p, q) 
values we plot the value of the variable measuring the ex- 
tent of disease spreading. For reasons which will soon be 
evident, we call the obtained plots the phase diagrams of 
epidemic spreading. An example of a phase diagram for 
the epidemic range is given in Fig. 2] The quantity pre- 
sented in the phase diagram is the cumulative probability 
for a finite epidemic range. For a complex network of fi- 
nite size the practical method of calculation of the said 
cumulative probability is based on the bimodal character 
of the epidemic range distribution, see Fig. [3] The cumu- 
lative probability is simply the sum of probabilities for 
the epidemic range starting from up to the range where 
the probability drops to zero. From the phase diagram 
we can easily identify two extreme regimes. The first one 
is characterized by high q and low p where the cumulative 
probability for a finite epidemic range tends to 1. The 
second regime appears at high p and low q and there the 
cumulative probability for a finite epidemic range tends 
to 0. The existence of these two regimes in their respec- 
tive ranges of (p, q) parameters is not surprising having 
in mind the very meaning of these parameters. How- 
ever, the phase diagram reveals a non-negligible area in 
the parametric space in which the cumulative probabil- 
ity for finite range differs significantly from both and 1. 
This area is the transitional area connecting two extreme 
regimes of local containment and epidemic. Within this 
region it is comparably probable that the disease spread- 
ing would be contained or that it would explode into an 
epidemic. In Figs. [S][5] we have phase diagrams for the 



number of infected nodes. In Fig [5] we see that at low p 
and high q the number of infected nodes is low compared 
to the total number of nodes in the network. On the other 
hand, in the area of high p and low q the number of in- 
fected nodes is of the order of the total number of nodes in 
the network. The areas of the parametric space in which 
these two extreme regimes are realized are connected by 
a broad transitional area. A comparison of phase dia- 
grams in Figs 3] and [5] shows that the transitional areas 
in both phase diagram correspond closely. The phase di- 
agram for standard deviation of the number of infected 
nodes, depicted in Fig. [5] reveals that the standard devi- 
ation is large in the very transitional area which has been 
identified in Figs 2] and O It should be stressed that this 
result is not a result of an insufficiently large sample of 
simulations. In the transitional area of the parametric 
space the standard deviation of the number of infected 
nodes saturates at a finite value as the size of sample in- 
creases, see Fig. [7] This observation is fully consistent 
with a bimodal character of the probability distribution 
for the number of infected nodes. Finally, in Fig. [5] we 
present length of a normalized ±3 standard deviation in- 
terval of the number of infected nodes defined as (/, u) = 
(max(0,(E(Y)-3a(Y))/N),mm((E(Y) + 3a(Y))/N,l)), 
where Y is the random variable of the number of infected 
nodes and N is the total number of nodes in the network. 
According to ineqality of Chebyshev the probability of Y 
taking value in the interval (l,u) is at least 88.89%. 

An important result, presented in phase diagrams 
given in Figs 0][B] is that the area of parametric space 
characterized by the bimodal distribution of the epidemic 
range and the number of infected nodes is large. There- 
fore, the appearance of the described bimodal distribu- 
tions in phase diagrams is generic. 

Given the large degree of heterogeneity of empirical 
complex networks, the following important question is 
how these phase diagrams of epidemic spreading depend 
on the choice of an initially infected node. Preliminary 
simulations show that large differences may exist. Still, 
observed phase diagrams for initially infected nodes of 
very different degrees show important similarities and 
we could roughly characterize them as qualitatively the 
same. This important question also has a considerable 
practical importance since the choice of the initially in- 
fected node may describe the difference between a ran- 
dom outbreak (any randomly selected node) and a ter- 
rorist act (hub) . A more elaborate study of the influence 
of the selection of the initially infected node is left for a 
future dedicated work [l6| . 

Finally, there remains the question of the dependence 
of the presented results on the particular empirical net- 
work used in the discussion up to this point. The de- 
scribed analysis was repeated on several other empirical 
networks: An undirected, unweighted network represent- 
ing the topology of the US Western States Power Grid 
|17| , network of coauthorships between scientists posting 
preprints on the Astrophysics E-Print Archive between 
Jan 1, 1995 and December 31, 1999. [5j, and a a sym- 
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FIG. 1: The number of infected nodes for network [f| with 
p = 0.3 and q = 0.7 averaged over 10000 simulations. 



FIG. 3: The distribution of the epidemic range for network 
[H with p — 0.3 and q = 0.7 averaged over 10000 simulations. 
The maximal distance from the initially infected node is 11. 
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FIG. 2: The number of infected nodes for network [j| with FIG. 4: The cumulative probability for a finite epidemic range 
p = 0.3 and q = 0.7 averaged over 10000 simulations. for 5]. Each point is obtained by averaging over 2000 simu- 

lations. 



metrized snapshot of the structure of the Internet at the 
level of autonomous systems, reconstructed from BGP 
tables posted by the University of Oregon Route Views 
Project [III]. The form of phase diagrams observed in 
the study of [B[ is the same as in simulations on other 
three empirical complex networks [H, [I?], Gil though there 
are also notable differences in quantitative details, as de- 
picted in Fig. [9l The conclusion is that the qualitative 
features of the phase diagrams of disease spreading, are 
the same for the SIR disease spreading model on the dif- 
ferent studied empirical complex networks. This finding 
is a strong indication that the form of phase diagrams is 
generic across complex networks and that it reflects some 
fundamental dynamics of disease spreading. 



III. THEORETICAL MODEL FOR M-ARY 
TREES 

The observed pattern of epidemic spreading calls for 
identification of the underlying mechanism producing 
it. The structure of empirical complex networks, de- 
spite many generic properties, is very intricate. Many 
structural characteristics of complex networks might con- 
tribute to the pattern of epidemic spreading. Therefore 
it is reasonable to start from a simple structure that still 
incorporates the key network properties for the process of 
disease spreading. Many complex networks can be locally 
well approximated by a tree-like structure. In empirical 
networks we very rarely observe tree-like structures and 
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FIG. 5: The average number of infected nodes for the network 
[jj. Each point is obtained by averaging over 2000 simula- 
tions. 
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FIG. 6: The standard deviation of the number of infected 
nodes for the network [|[ . Each point is obtained by averaging 
over 2000 simulations. 
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FIG. 7: 

The convergence of the standard deviation of the number of 
infected nodes for p — 0.125 and q = 0.2. The convergence of 
the standard deviation to a nonvanishing value is a strong 
indication of bimodal probability distribution. 
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FIG. 8: The length of a normalized ±3 standard deviation 
interval of the number of infected nodes for the network [jj. 
Each point is obtained by averaging over 2000 simulations. 



we say that the local description of the network as a tree 
is a good approximation if the clustering is small. The 
IT/communication empirical networks in general fit into 
this class, such as e-mail and Internet router networks 
[l9j |. The social contact networks tend to be more clus- 
tered, although exceptions exist, such as coauthorship 
networks in medicine (20| . In this paper we exploit these 
findings and consider the disease spreading on a m-ary 
tree as a starting model in the study of the phase diagram 
of epidemic spreading. In m-ary trees that we consider 
in this paper each node has m children. The simplic- 
ity of this starting model provides a transparent analy- 
sis and also probes the relevance of the very tree struc- 
ture as the backbone of the complex network structure 



in the disease spreading dynamics 2l|. An additional 
argument for considering the disease spreading on tree- 
like networks is that the results obtained in this setting 
may serve as a useful lower bounds for the extent of the 
disease spreading (expressed i.e. as a number of infected 
nodes or the range of the epidemic). In general, the num- 
ber of infected nodes and the range of the epidemic on a 
given empirical network is always larger than for an epi- 
demic on any spanning tree with the same initial node. 
The extent of the disease spreading on trees, expressed 
i.e. as a number of infected nodes or the range of the 
epidemic, represent a lower bound for the extent of the 
disease spreading on a complex network. This fact be- 
comes especially useful when the result for a tree reaches 



FIG. 9: The phase diagram of epidemic spreading for the average number of infected nodes for co-authorships in astrophysics 
[H (left), Internet at the level of autonomous systems pl| (center) and the US Western States Power Grid [TtJ (right). 



its maximally allowed value. For example, if the proba- 
bility for spreading to entire tree is 1, the corresponding 
probability for a complex network will also be 1. 

The study of the epidemic spreading on a m-ary tree is 
performed both analytically and using computer simula- 
tions. The basic element for describing the epidemic on a 
m-ary tree analytically is the probability distribution of 
the infected node infecting some number of its children 
nodes. Let us consider a problem of epidemic spreading 
in a complete bipartite graph consisting of two classes of 
nodes: s infected nodes (class I) and n susceptible nodes 
(class II). Each node from the class I is connected to all 
nodes from the class II. We are interested in a number of 
susceptible nodes that will get infected during the course 
of the epidemic. The random variable of the number of 
nodes in class II that eventually get infected is denoted by 

(s) 

Xn . The probability of k < n nodes of class II getting 
eventually infected is given by the expression 
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This distribution exhibits very interesting properties, 
such as multiple peaks as depicted in Fig. [TUJ 

The probability distribution for the range of the epi- 
demic spreading on a m-ary tree can be calculated within 
the theory of branching processes (22j . The random vari- 
able Z n representing the number of infected nodes at the 
n th level of the tree (n > 1) can be represented as 
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where Z n ^ is the random variable of a number of n level 
infected nodes that got infected via the i th infected node 
at the first level. At the first level we have Z\ ~ li 1 '. 



The generating functions F n (s) — J^'o' F(Z n 
found to satisfy the relation 

F n (s)= J P(F n _ 1 (s)), 
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FIG. 10: The probability distribution p^\ for p = 0.45, q = 
0.45, s — 1 and n = 350 averaged over 50000 simulations. 
The results of simulations are depicted by circles and the full 
line represents the analytically obtained function fTJ. 



where F(s) = F\(s). The probability that there are no 
infected nodes (at the end of the epidemic) at the n th 
level, F n (0), is the probability that the range of the epi- 
demic is < n — 1. Then the range of the epidemic is n 
with a probability 



d„ = -F„+i(0)-F„(0) 



(4) 



For n > 1 the quantities P„(0) can be calculated iter- 
atively from ^ with do = F(0) = p„ o- An excellent 
agreement of analytic and simulational results for a typ- 
ical distribution of the range is found. 

The cumulative probability of having a finite range of 
epidemic d to t = Si<oo ^ can ^ e used for the definition of 
the phase diagram of epidemic spreading on a m-ary tree. 
The nature of the solutions of the equation F(xf) = Xf 
serves as an equivalent tool for the definition of phases 
in the said phase diagram. For dtot = 1, for which a 
solution Xf < 1 does not exist, the range distribution 
is unimodal and the disease is locally contained. For 



FIG. 11: The cumulative probability for a finite epidemic 
range for a binary tree with 12 levels. Each point is obtained 
by averaging over 10000 simulations. 
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FIG. 12: The average number of infected nodes for a binary 
tree with 12 levels. Each point is obtained by averaging over 
10000 simulations. 



dtot < 1, where the solution Xf < 1 exists, the range 
distribution is bimodal and there are finite probabilities 
for the locally contained outbreak and for the epidemic 
sweeping through the entire tree. A sharp boundary be- 
tween these two phases exists and it can be obtained from 
the condition = mE(x[ 1 ') = 1. An interesting 

consequence of this relation for m = 1 is that for all 
values p < 1 there exits only the phase of local contain- 
ment of the disease. The phase diagram of the epidemic 
spreading on a m-ary tree obtained in simulations is de- 
picted in Fig. I 111 whereas those obtained from analytical 
considerations for m-ary trees are given in Fig. 1151 

The total number of infected nodes in a m-ary tree with 
n levels can be represented by a variable Y n = Yli=o ^ 



FIG. 13: The standard deviation of the number of infected 
nodes for a binary tree with 12 levels. Each point is obtained 
by averaging over 10000 simulations. 
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FIG. 14: The normalized ±3 standard deviation interval for 
the number of infected nodes for a binary tree with 12 levels. 
Each point is obtained by averaging over 10000 simulations. 



The analytic expressions for the expectation and the vari- 
ance of the number of infected nodes Y n are given by the 
following expressions 
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FIG. 15: Phase diagrams of epidemic spreading for m-ary trees for m = 2 (left), m = 3 (center), and m — 4 (right) obtained 
from analytical considerations. The shaded region corresponds to a unimodal behavior and the unshaded region to the bimodal 
behavior. 
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These analytical results are in excellent agreement with 
the simulations on m-ary trees. The phase diagrams for 
the average number, standard deviation and the normal- 
ized ±3cr interval of infected nodes are presented in Fig- 
ures H2J EH and [T31 respectively. 



IV. DISCUSSION AND CONCLUSIONS 

The main result of the analysis of the disease spreading 
on regular trees is that they reproduce the main quali- 
tative characteristics of the phase diagrams of disease 
spreading observed on empirical complex networks. The 
regions of high q and low p where the local containment 
of the disease dominates and the region of low q and 
high p where the onset of epidemic is almost certain are 
connected by a large transitional area where the local 
containment and the epidemic spread have comparable 
probabilities. The insight provided by the analysis of 
regular trees suggests the following mechanism producing 
the observed phase diagrams of disease spreading. The 
process of the disease spreading is an interplay of two 
driving forces: the stochastic nature of the SIR model 
(and other epidemiological models) which always allows 
the possibility the disease will not be propagated further 



at any step (as an extreme demonstration, one should 
note that for a m-ary tree with m = 1 the spreading of 
the disease is always contained) and the exponentially 
growing number of paths through which the disease may 
spread. When the extinguishing nature of the SIR model 
dominates the local containment prevails. A very large 
number of disease spreading pathways on the other hand 
strongly stimulates the onset of epidemic. A situation in 
which these two driving forces are largely in equilibrium 
is realized in the transitional area where both typical 
regimes are comparably probable. 

The disease spreading at a m-ary tree in SIR model 
exhibits the same qualitative or even semi-quantitative 
features of phase diagrams as those observed in empirical 
complex networks. This is an important result since it in- 
dicates that the structure of the m-ary tree is sufficient to 
reproduce the main features of the phase diagrams of epi- 
demic spreading. At a quantitative level, however, there 
are many peculiarities that are observed in the disease 
spreading on empirical networks. A reasonable path in 
research efforts to explain these peculiarities is studying 
the effects or network structural features more complex 
that the underlying (spanning) trees. The study of the 
effects of nontrivial degree distributions and the influence 
of cycles are natural first stops along this path. 
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